In this paper, we study the \underline{R}obust \underline{o}ptimization for \underline{se}quence \underline{Net}worked \underline{s}ubmodular maximization (RoseNets) problem. We interweave the robust optimization with the sequence networked submodular maximization. The elements are connected by a directed acyclic graph and the objective function is not submodular on the elements but on the edges in the graph. Under such networked submodular scenario, the impact of removing an element from a sequence depends both on its position in the sequence and in the network. This makes the existing robust algorithms inapplicable. In this paper, we take the first step to study the RoseNets problem. We design a robust greedy algorithm, which is robust against the removal of an arbitrary subset of the selected elements. The approximation ratio of the algorithm depends both on the number of the removed elements and the network topology. We further conduct experiments on real applications of recommendation and link prediction. The experimental results demonstrate the effectiveness of the proposed algorithm.
translated by 谷歌翻译
Long document retrieval aims to fetch query-relevant documents from a large-scale collection, where knowledge distillation has become de facto to improve a retriever by mimicking a heterogeneous yet powerful cross-encoder. However, in contrast to passages or sentences, retrieval on long documents suffers from the scope hypothesis that a long document may cover multiple topics. This maximizes their structure heterogeneity and poses a granular-mismatch issue, leading to an inferior distillation efficacy. In this work, we propose a new learning framework, fine-grained distillation (FGD), for long-document retrievers. While preserving the conventional dense retrieval paradigm, it first produces global-consistent representations crossing different fine granularity and then applies multi-granular aligned distillation merely during training. In experiments, we evaluate our framework on two long-document retrieval benchmarks, which show state-of-the-art performance.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Artificial Intelligence (AI) is having a tremendous impact across most areas of science. Applications of AI in healthcare have the potential to improve our ability to detect, diagnose, prognose, and intervene on human disease. For AI models to be used clinically, they need to be made safe, reproducible and robust, and the underlying software framework must be aware of the particularities (e.g. geometry, physiology, physics) of medical data being processed. This work introduces MONAI, a freely available, community-supported, and consortium-led PyTorch-based framework for deep learning in healthcare. MONAI extends PyTorch to support medical data, with a particular focus on imaging, and provide purpose-specific AI model architectures, transformations and utilities that streamline the development and deployment of medical AI models. MONAI follows best practices for software-development, providing an easy-to-use, robust, well-documented, and well-tested software framework. MONAI preserves the simple, additive, and compositional approach of its underlying PyTorch libraries. MONAI is being used by and receiving contributions from research, clinical and industrial teams from around the world, who are pursuing applications spanning nearly every aspect of healthcare.
translated by 谷歌翻译
Pre-trained multilingual language models show significant performance gains for zero-shot cross-lingual model transfer on a wide range of natural language understanding (NLU) tasks. Previously, for zero-shot cross-lingual evaluation, pre-trained models are only fine-tuned on English data and tested on a variety of target languages. In this paper, we do cross-lingual evaluation on various NLU tasks (sentence classification, sequence labeling, question answering) using prompt-tuning and compare it with fine-tuning. The results show that prompt tuning achieves much better cross-lingual transfer than fine-tuning across datasets, with only 0.1% to 0.3% tuned parameters. Additionally, we demonstrate through the analysis that prompt tuning can have better cross-lingual transferability of representations on downstream tasks with better aligned decision boundaries.
translated by 谷歌翻译
排名者在事实上的“检索和rerank”管道中起着必不可少的作用,但其训练仍然落后 - 从中​​度的负面因素或/和/和/和作为回收者的辅助模块中学习。在这项工作中,我们首先确定了强大的排名者的两个主要障碍,即是由训练有素的回猎犬和非理想的负面负面的固有标签噪声,该噪声是为高能力的排名所采样的。因此,我们提出多个检索器,因为负面发电机改善了排名者的鲁棒性,其中i)涉及广泛的分发标签噪声,使排名者与每个噪声分布相对,而ii)与排名相对较接近排名负分配,导致更具挑战性的培训。为了评估我们的强大排名者(称为r $^2 $ anker),我们在各种环境中进行了有关流行通道检索基准测试的各种实验,包括BM25级,全等级,回收者蒸馏等。经验结果验证了新的州 - 新州 - 新州 - 我们模型的效果。
translated by 谷歌翻译
视觉变形金刚最近的成功是在图像识别中挥舞着卷积神经网络(CNN)的长期优势。具体而言,就稳健性而言,最近的研究发现,无论训练设置如何,变压器本质上比CNN更强大。此外,人们认为,变形金刚的这种优越性应该在很大程度上被认为是他们的自我注意力型建筑本身。在本文中,我们通过密切研究变压器的设计来质疑这种信念。我们的发现导致了三种高效的体系结构设计,以提高鲁棒性,但很简单,可以在几行代码中实现,即a)修补输入图像,b)扩大内核大小,c)降低激活层和归一化层。将这些组件融合在一起,我们能够构建纯CNN体系结构,而没有任何类似注意力的操作,这些操作比变形金刚更强大,甚至更健壮。我们希望这项工作可以帮助社区更好地了解强大的神经体系结构的设计。该代码可在https://github.com/ucsc-vlaa/robustcnn上公开获得。
translated by 谷歌翻译
我们提出了一个域理论框架,用于验证神经网络的鲁棒性分析。我们首先分析一般网络类别的全球鲁棒性。然后,利用Edalat的域理论L衍生物与Clarke的广义梯度相吻合的事实,我们扩展了攻击性不足的局部鲁棒性分析的框架。我们的框架是设计构造正确的算法的理想选择。我们通过开发经过验证的算法来估计前馈回归器常数来体现这一主张。我们证明了算法在可区分网络上以及一般位置relu网络的完整性。我们在有效给定域的框架内获得可计算结果。使用我们的域模型,可以统一分析可区分和非差异网络。我们使用任意推测间隔算术实施算法,并介绍一些实验的结果。我们的实现也得到了真正的验证,因为它也处理浮点错误。
translated by 谷歌翻译
知识蒸馏最近成为一种流行的技术,以改善卷积神经网络的模型泛化能力。然而,它对图形神经网络的影响小于令人满意的,因为图形拓扑和节点属性可能以动态方式改变,并且在这种情况下,静态教师模型引导学生培训不足。在本文中,我们通过在在线蒸馏时期同时培训一组图形神经网络来解决这一挑战,其中组知识发挥作用作为动态虚拟教师,并且有效地捕获了图形神经网络的结构变化。为了提高蒸馏性能,在学生之间转移两种知识,以增强彼此:在图形拓扑和节点属性中反映信息的本地知识,以及反映课程预测的全局知识。随着香草知识蒸馏等,在利用有效的对抗性循环学习框架,将全球知识与KL分歧转移。广泛的实验验证了我们提出的在线对抗蒸馏方法的有效性。
translated by 谷歌翻译
特征交互已被识别为机器学习中的一个重要问题,这对于点击率(CTR)预测任务也是非常重要的。近年来,深度神经网络(DNN)可以自动从原始稀疏功能中学习隐式非线性交互,因此已广泛用于工业CTR预测任务。然而,在DNN中学到的隐式特征交互不能完全保留原始和经验特征交互的完整表示容量(例如,笛卡尔产品)而不会损失。例如,简单地尝试学习特征A和特征B <A,B>作为新特征的显式笛卡尔产品表示可以胜过先前隐式功能交互模型,包括基于分解机(FM)的模型及其变体。在本文中,我们提出了一个共同行动网络(CAN),以近似于显式成对特征交互,而不会引入太多的附加参数。更具体地,给出特征A及其相关的特征B,通过学习两组参数来建模它们的特征交互:1)嵌入特征A和2)以表示特征B的多层Perceptron(MLP)。近似通过通过特征B的MLP网络传递特征A的嵌入可以获得特征交互。我们将这种成对特征交互作为特征合作,并且这种共动网单元可以提供拟合复合物的非常强大的容量功能交互。公共和工业数据集的实验结果表明,可以优于最先进的CTR模型和笛卡尔产品方法。此外,可以在阿里巴巴的显示广告系统中部署,获得12 \%的CTR和8 \%关于每个Mille(RPM)的收入,这是对业务的巨大改进。
translated by 谷歌翻译